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Abstract 

Imaging spectrometers measure electromagnetic energy scattered in their instantaneous field view in 
hundreds or thousands of spectral channels with higher spectral resolution than multispectml cameras. 
Imaging spectrometers are therefore often referred to as hyperspectral cameras (HSCs). Higher spectral 
resolution enables material identification via spectroscopic analysis, which faciUtates countless applica- 
tions that require identifying materials in scenarios unsuitable for classical spectroscopic analysis. Due to 
low spatial resolution of HSCs, microscopic material mixing, and multiple scattering, spectra measured 
by HSCs are mixtures of spectra of materials in a scene. Thus, accurate estimation requires unmixing. 
Pixels are assumed to be mixtures of a few materials, called endmembers. Unmixing involves estimating 
all or some of: the number of endmembers, their spectral signatures, and their abundances at each 
pixel. Unmixing is a challenging, ill-posed inverse problem because of model inaccuracies, observation 
noise, environmental conditions, endmember variability, and data set size. Researchers have devised and 
investigated many models searching for robust, stable, tractable, and accurate urmiixing algorithms. This 
paper presents an overview of urmiixing methods from the time of Keshava and Mustard's unmixing 
tutorial [1] to the present. Mixing models are first discussed. Signal-subspace, geometrical, statistical, 
sparsity-based, and spatial-contextual urmiixing algorithms are described. Mathematical problems and 
potential solutions are described. Algorithm characteristics are illustrated experimentally. 
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Fig. 1. Hyperspectral imaging concept. 



I. Introduction 

Hyperspectral cameras contribute significantly to earth observation and remote sensing [12], 

[13]. Their potential motivates the development of small, commercial, high spatial and spectral resolution 
instruments. They have also been used in food safety [14]-[17], pharmaceutical process monitoring and 
quality control [18]-[22], and biomedical, industrial, and biometric, and forensic applications [23]-[27]. 

HSCs can be built to function in many regions of the electro-magnetic spectrum. The focus here is 
on those covering the visible, near-infrared, and shortwave infrared spectral bands (in the range 0.3/im 
to 2.5fim [5]). Disregarding atmospheric effects, the signal recorded by an HSC at a pixel is a mixture 
of light scattered by substances located in the field of view [3]. Fig. 1 illustrates the measured data. 
They are organized into planes forming a data cube. Each plane corresponds to radiance acquired over a 
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Fig. 2. Linear mixing . Tiie measured radiance at a pixel is a weighited average of tiie radiances of the materials present at the 
pixel. 
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Fig. 3. Two nonlinear mixing scenarios. Left hand: intimate mixture; Right hand: multilayered scene. 

spectral band for all pixels. Each spectral vector corresponds to the radiance acquired at a given location 
for all spectral bands. 

A. Linear and nonlinear mixing models 

Hyperspectral unmixing (HU) refers to any process that separates the pixel spectra from a hyperspectral 
image into a collection of constituent spectra, or spectral signatures, called endmembers and a set of 
fractional abundances, one set per pixel. The endmembers are generally assumed to represent the pure 
materials present in the image and the set of abundances, or simply abundances, at each pixel to represent 
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the percentage of each endmember that is present in the pixel. 

There are a number of subtleties in this definition. First, the notion of a pure material can be subjective 
and problem dependent. For example, suppose a hyperspectral image contains spectra measured from 
bricks laid on the ground, the mortar between the bricks, and two types of plants that are growing through 
cracks in the brick. One may suppose then that there are four endmembers. However, if the percentage of 
area that is covered by the mortar is very small then we may not want to have an endmember for mortar. 
We may just want an endmember for "brick". It depends on if we have a need to directly measure the 
proportion of mortar present. If we have need to measure the mortar, then we may not care to distinguish 
between the plants since they may have similar signatures. On the other hand, suppose that one type 
of plant is desirable and the other is an invasive plant that needs to be removed. Then we may want 
two plant endmembers. Furthermore, one may only be interested in the chlorophyll present in the entire 
scene. Obviously, this discussion can be continued ad nauseum but it is clear that the definition of the 
endmembers can depend upon the application. 

The second subtlety is with the proportions. Most researchers assume that a proportion represents 
the percentage of material associated with an endmember present in the part of the scene imaged 
by a particular pixel. Indeed, Hapke [28] states that the abundances in a linear mixture represent the 
relative area of the corresponding endmember in an imaged region. Lab experiments conducted by 
some of the authors have confirmed this in a laboratory setting. However, in the nonUnear case, the 
situation is not as straightforward. For example, calibration objects can sometimes be used to map 
hyperspectral measurements to reflectance, or at least to relative reflectance. Therefore, the coordinates of 
the endmembers are approximations to the reflectance of the material, which we may assume for the sake 
of argument to be accurate. The reflectance is usually not a Unear function of the mass of the material 
nor is it a linear function of the cross-sectional area of the material. A highly reflective, yet small object 
may dominate a much larger but dark object at a pixel, which may lead to inaccurate estimates of the 
amount of material present in the region imaged by a pixel, but accurate estimates of the contribution of 
each material to the reflectivity measured at the pixel. Regardless of these subtleties, the large number 
of applications of hyperspectral research in the past ten years indicates that current models have value. 

Unmixing algorithms currently rely on the expected type of mixing. Mixing models can be characterized 
as either Unear or nonlinear [1], [29]. Linear mixing holds when the mixing scale is macroscopic [30] 
and the incident light interacts with just one material, as is the case in checkerboard type scenes [31], 
[32]. In this case, the mixing occurs within the instrument itself. It is due to the fact that the resolution of 
the instrument is not fine enough. The light from the materials, although almost completely separated, is 
mixed within the measuring instrument. Fig. 2 depicts linear mixing: Light scattered by three materials 
in a scene is incident on a detector that measures radiance in B bands. The measured spectrum y G 
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is a weighted average of the material spectra. The relative amount of each material is represented by the 
associated weight. 

Conversely, nonhnear mixing is usually due to physical interactions between the light scattered by 
multiple materials in the scene. These interactions can be at a classical, or multilayered, level or at a 
microscopic, or intimate, level. Mixing at the classical level occurs when hght is scattered from one or 
more objects, is reflected off additional objects, and eventually is measured by hyperspectral imager. A 
nice illustrative derivation of a multilayer model is given by Borel and Gerstl [33] who show that the 
model results in an infinite sequence of powers of products of reflectances. Generally, however, the first 
order terms are sufficient and this leads to the bilinear model. Microscopic mixing occurs when two 
materials are homogeneously mixed [28]. In this case, the interactions consist of photons emitted from 
molecules of one material are absorbed by molecules of another material, which may in turn emit more 
photons. The mixing is modeled by Hapke as occurring at the albedo level and not at the reflectance 
level. The apparent albedo of the mixture is a linear average of the albedos of the individual substances 
but the reflectance is a nonlinear function of albedo, thus leading to a different type of nonlinear model. 

Fig. 3 illustrates two non-linear mixing scenarios: the left-hand panel represents an intimate mixture, 
meaning that the materials are in close proximity; the right-hand panel illustrates a multilayered scene, 
where there are multiple interactions among the scatterers at the different layers. 

Most of this overview is devoted to the linear mixing model. The reason is that, despite its simplicity, 
it is an acceptable approximation of the light scattering mechanisms in many real scenarios. Furthermore, 
in contrast to nonlinear mixing, the Unear mixing model is the basis of a plethora of unmixing models 
and algorithms spanning back at least 25 years. A samphng can be found in [1], [34]-[47]). Others will 
be discussed throughout the rest of this paper. 

B. Brief overview of nonlinear approaches 

Radiative transfer theory (RTT) [48] is a well estabhshed mathematical model for the transfer of 
energy as photons interacts with the materials in the scene. A complete physics based approach to 
nonlinear unmixing would require inferring the spectral signatures and material densities based on the 
RTT. Unfortunately, this is an extremely complex ill-posed problem, relying on scene parameters very hard 
or impossible to obtain. The Hapke [31], Kulbelka-Munk [49] and Shkuratov [50] scattering formulations 
are three approximations for the analytical solution to the RTT. The former has been widely used to study 
diffuse reflection spectra in chemistry [51] whereas the later two have been used, for example, in mineral 
unmixing applications [1], [52]. 

One wide class of strategies is aimed at avoiding the complex physical models using simpler but 
physics inspired models, such kernel methods. In [53] and following works [54]-[57], Broadwater et at. 
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have proposed several kernel-based unmixing algorithms to specifically account for intimate mixtures. 
Some of these kernels are designed to be sufficiently flexible to allow several nonUnearity degrees (using, 
e.g., radial basis functions or polynomials expansions) while others are physics-inspired kernels [55]. 

Conversely, bilinear models have been successively proposed in [58]-[62] to handle scattering effects, 
e.g., occurring in the multilayered scene. These models generaUze the standard linear model by introducing 
additional interaction terms. They mainly differ from each other by the additivity constraints imposed on 
the mixing coefficients [63]. 

However, limitations inherent to the unmixing algorithms that explicitly rely on both models are 
twofold. Firstly, they are not multipurpose in the sense that those developed to process intimate mixtures 
are inefficient in the multiple interaction scenario (and vice versa). Secondly, they generally require the 
prior knowledge of the endmember signatures. If such information is not available, these signatures have 
to be estimated from the data by using an endmember extraction algorithm. 

To achieve flexibility, some have resorted to machine learning strategies such as neural networks [64]- 
[70], to nonlinearly reduce dimensionality or learn model parameters in a supervised fashion from a 
collection of examples (see [35] and references therein). The polynomial post nonlinear mixing model 
introduced in [71] seems also to be sufficiently versatile to cover a wide class of nonlinearities. However, 
again, these algorithms assumes the prior knowledge or extraction of the endmembers. 

Mainly due to the difficulty of the issue, very few attempts have been conducted to address the 
problem of fully unsupervised nonhnear unmixing. One must still concede that a significant contribution 
has been carried by Heylen et al in [72] where a strategy is introduced to extract endmembers that 
have been nonUnearly mixed. The algorithmic scheme is similar in many respects to the well-known 
N-FINDR algorithm [73]. The key idea is to maximize the simplex volume computed with geodesic 
measures on the data manifold. In this work, exact geodesic distances are approximated by shortest-path 
distances in a nearest-neighbor graph. Even more recently, same authors have shown in [74] that exact 
geodesic distances can be derived on any data manifold induced by a nonlinear mixing model, such as 
the generalized bilinear model introduced in [62]. 

Quite recently. Close and Gader have devised two methods for fully unsupervised nonhnear unmixing 
in the case of intimate mixtures [75], [76] based on Hapke's average albedo model cited above. One 
method assumes that each pixel is either linearly or nonhnearly mixed. The other assumes that there 
can be both nonlinear and hnear mixing present in a single pixel. The methods were shown to more 
accurately estimate physical mixing parameters using measurements made by Mustard et al. [56], [57], 
[64], [77] than existing techniques. There is still a great deal of work to be done, including evaluating 
the usefulness of combining bilinear models with average albedo models. 

In summary, although researchers are beginning to expand more aggressively into nonlinear mixing. 
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the research is immature compared with Unear mixing. There has been a tremendous effort in the past 
decade to solve Unear unmixing problems and that is what will be discussed in the rest of this paper. 

C. Hyperspectral unmixing processing chain 

Fig. 4 shows the processing steps usually involved in the hyperspectral unmixing chain: atmospheric 
correction, dimensionality reduction, and unmixing, which may be tackled via the classical endmember 
determination plus inversion, or via sparse regression or sparse coding approaches. Often, endmember 
determination and inversion are implemented simultaneously. Below, we provide a brief characterization 
of each of these steps: 

1) Atmospheric correction. The atmosphere attenuates and scatterers the light and therefore affects 
the radiance at the sensor. The atmospheric correction compensates for these effects by converting 
radiance into reflectance, which is an intrinsic property of the materials. We stress, however, that 
linear unmixing can be carried out directly on radiance data. 

2) Data reduction. The dimensionality of the space spanned by spectra from an image is generally 
much lower than available number of bands. Identifying appropriate subspaces facilitates dimension- 
ality reduction, improving algorithm performance and complexity and data storage. Furthermore, 
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if the linear mixture model is accurate, the signal subspace dimension is one less than equal to the 
number of endmembers, a crucial figure in hyperspectral unmixing. 

3) Unmixing. The unmixing step consists of identifying the endmembers in the scene and the fractional 
abundances at each pixel. Three general approaches will be discussed here. Geometrical approaches 
exploit the fact that Unearly mixed vectors are in a simplex set or in a positive cone. Statistical 
approaches focus on using parameter estimation techniques to determine endmember and abundance 
parameters. Sparse regression approaches, which formulates unmixing as a linear sparse regression 
problem, in a fashion similar to that of compressive sensing [78], [79]. This framework relies on 
the existence of spectral libraries usually acquired in laboratory. A step forward, termed sparse 
coding [80], consists of learning the dictionary from the data and, thus, avoiding not only the need 
of libraries but also calibration issues related to different conditions under which the libraries and 
the data were acquired. 

4) Inversion. Given the observed spectral vectors and the identified endmembers, the inversion step 
consists of solving a constrained optimization problem which minimizes the residual between 
the observed spectral vectors and the linear space spanned by the inferred spectral signatures; 
the implicit fractional abundances are, very often, constrained to be nonnegative and to sum 
to one {i.e., they belong to the probability simplex). There are, however, many hyperspectral 
unmixing approaches in which the endmember determination and inversion steps are implemented 
simultaneously. 

The remainder of the paper is organized as follows. Section 2 describes the hnear spectral mixture 
model adopted as the baseline model in this contribution. Section 3 describes techniques for subspace 
identification. Sections 4, 5, 6, 7 describe four classes of techniques for endmember and fractional 
abundances estimation under the linear spectral unmixing. Sections 4 and 5 are devoted to the longstanding 
geometrical and statistical based approaches, respectively. Sections 6 and 7 are devoted to the recently 
introduced sparse regression based unmixing and to the exploitation of the spatial contextual information, 
respectively. Each of these sections introduce the underlying mathematical problem and summarizes state- 
of-the-art algorithms to address such problem. 

Experimental results obtained from simulated and real data sets illustrating the potential and limitations 
of each class of algorithms are described. The experiments do not constitute an exhaustive comparison. 
Both code and data for all the experiments described here are available at http://www.lx.it.pt/~bioucas/ 
code/unmixing_overview.zip. The paper concludes with a summary and discussion of plausible future 
developments in the area of spectral unmixing. 
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II. Linear MIXTURE MODEL 

If the multiple scattering among distinct endmembers is negligible and the surface is partitioned 
according to the fractional abundances, as illustrated in Fig. 2, then the spectrum of each pixel is 
well approximated by a linear mixture of endmember spectra weighted by the corresponding fractional 
abundances [1], [3], [29], [39]. In this case, the spectral measurement' at channel i € {1, . . . ,B} (B is 
the total number of channels) from a given pixel, denoted by jji, is given by the linear mixing model 
(LMM) 

p 

yi = ^Pi30ij + wi, (1) 

where pij > denotes the spectral measurement of endmember j G {1, . . . ,p} at i*'' the spectral band, 
Oj > denotes the fractional abundance of endmember j, Wi denotes an additive perturbation {e.g., 
noise and modeling errors), and p denotes the number of endmembers. At a given pixel, the fractional 
abundance aj, as the name suggests, represents the fractional area occupied by the jth endmember. 
Therefore, the fractional abundances are subject to the following constraints: 

Nonnegativity > 0, j = 1, . . . ,p 

(2) 

Sum- to-one Z^j=iQ'j = l; 
i.e., the fractional abundance vector o: = [ai, 02, . • • , oipY (the notation {■)'^ indicates vector transposed) 
is in the standard (p — l)-simplex (or unit {p— l)-simplex). In HU jargon, the nonnegativity and the sum- 
to-one constraints are termed abundance nonnegativity constraint (ANC) and abundance sum constraint 
(ASC), respectively. Researchers may sometimes expect that the abundance fractions sum to less than 
one since an algorithm may not be able to account for every material in a pixel; it is not clear whether 
it is better to relax the constraint or to simply consider that part of the modeling error. 

Let y G denote a B-dimensional column vector, and va.j = [pij,p2j, ■ ■ ■ ,PBj]^ denote the spectral 
signature of the jth endmember. Expression (1) can then be written as 

y = Ma + w, (3) 

where M = [mi, m2, . . . , nip] is the mixing matrix containing the signatures of the endmembers present 
in the covered area, and w = [wi, . . .,wb]'^. Assuming that the columns of IVl are affinely independent, 

i.e., m2 — mi, m3 — mi, . . . , mp — mi are linearly independent, then the set 

p 

C = {y = Ma : J]aj- = 1, a^- > 0, j = l,...,,p} 

i.e., the convex hull of the columns of M, is a (p — l)-simplex in M^. Fig. 5 illustrates a 2-simplex C 

'Although the type of spectral quantity (radiance, reflectance, etc.) is important when processing data, specification is not 
necessary to derive the mathematical approaches. 
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C = conv{M} 
=2-simplex 



y = Ma 



Fig. 5. Illustration of the simplex set C for p=3 (C is the convex hull of the columns of M, C — conv{M}). Green circles 
represent spectral vectors. Red circles represent vertices of the simplex and correspond to the endmembers. 

for a hypothetical mixing matrix M containing three endmembers. The points in green denote spectral 
vectors, whereas the points in red are vertices of the simplex and correspond to the endmembers. Note 
that the inference of the mixing matrix M is equivalent to identifying the vertices of the simplex C. This 
geometrical point of view, exploited by many unmixing algorithms, will be further developed in Sections 



Since many algorithms adopt either a geometrical or a statistical framework [34], [36], they are a 
focus of this paper. To motivate these two directions, let us consider the three data sets shown in Fig. 6 
generated under the linear model given in Eq. 3 where the noise is assumed to be negligible. The spectral 
vectors generated according to Eq. (3) are in a simplex whose vertices correspond to the endmembers. 
The left hand side data set contains pure pixels, i.e, for any of the p endmembers there is at least 
one pixel containing only the correspondent material; the data set in the middle does not contain pure 
pixels but contains at least p — I spectral vectors on each facet. In both data sets (left and middle), the 
endmembers may by inferred by fitting a minimum volume (MV) simplex to the data; this rather simple 
and yet powerful idea, introduced by Craig in his seminal work [81], underlies several geometrical based 
unmixing algorithms. A similar idea was introduced in 1989 by Perczel in the area of Chemometrics et 
al. [82]. 

The MV simplex shown in the right hand side example of Fig. 6 is smaller than the true one. This 
situation corresponds to a highly mixed data set where there are no spectral vectors near the facets. 
For these classes of problems, we usually resort to the statistical framework in which the estimation of 
the mixing matrix and of the fractional abundances are formulated as a statistical inference problem by 
adopting suitable probability models for the variables and parameters involved, namely for the fractional 
abundances and for the mixing matrix. 



IV-B. 
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Fig. 6. Illustration of the concept of simplex of minimum volume containing the data for three data sets. The endmembers in 
the left hand side and in the middle are identifiable by fitting a simplex of minimum volume to the data, whereas this is not 
applicable to the right hand side data set. The former data set correspond to a highly mixed scenario. 



A. Characterization of the Spectral Unmixing Inverse Problem 

Given the data set Y = [yi, . . . ,y„] G M^^" containing n i?-dimensional spectral vectors, the linear 
HU problem is, with reference to the linear model (3), the estimation of the mixing matrix M and of the 
fractional abundances vectors a j corresponding to pixels i = 1 , . . . , n. This is often a difficult inverse 
problem, because the spectral signatures tend to be strongly correlated, yielding badly-conditioned mixing 
matrices and, thus, HU estimates can be highly sensitive to noise. This scenario is illustrated in Fig. 7, 



uncertainty regions 
due to noise 




ni3 - 



Fig. 7. Illustration of a badly-conditioned mixing matrices and noise (represented by uncertainty regions) centered on clean 
spectral vectors represented by green circles. 



where endmembers m2 and ma are very close, thus yielding a badly-conditioned matrix M, and the 
effect of noise is represented by uncertainty regions. 

To characterize the linear HU inverse problem, we use the signal-to-noise-ratio (SNR) 

SNR = ^tll^ll'] = 

"E[||w||2] trace(R^)' 

where and R^, are, respectively, the signal (i.e., x = Mq) and noise correlation matrices and E 
denotes expected value. Besides SNR, we introduce the signal-to-noise-ratio spectral distribution (SNR- 
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SD) defined as 

SNR-SD(i) = ^^^^^ ,i = l,...,p, (4) 

where {\i,x,Gi,x) is the i^^ eigenvalue-eigenvector couple of R^; ordered by decreasing value of Aj,x- 
The ratio SNR-SD(i) yields the signal-to-noise ratio (SNR) along the signal direction Therefore, we 
must have SNR-SD(i) » 1 for z = 1, . . . ,p, in order to obtain acceptable unmixing results. Otherwise, 
there are directions in the signal subspace significantly corrupted by noise. 



SNR-SD 




eigen direction 

Fig. 8. Signal-to-noise-ratio spectral distribution (SNR-SD) for the data sets SudP5SNR40, SusgsP5SNR40, and Rcuprite. The 
first two are simulated and contain p = 5 endmembers and the third is a subset of the AVIRIS Cuprite data set. 

Fig. 8 plots SNR-SD (i), in the interval z = 1, . . . , 50, for the following data sets: 

• SudP5SNR40: simulated; mixing matrix M sampled from a uniformly distributed random variable 
in the interval [0, 1]; p = 5; n = 5000; fractional abundances distributed uniformly on the 4-unit 
simplex; SNR = 40dB. 

• SusgsP5SNR40: simulated; mixing matrix M sampled from the United States Geological Survey 
(USGS) spectral library^; p = 5; n = 5000; fractional abundances distributed uniformly on the 
4-unit simplex; SNR = 40dB. 

• Rcuprite: real; subset of the well-known AVIRIS cuprite data cube^ with size 250 lines by 191 
columns by 188 bands (noisy bands were removed). 

The signal and noise correlation matrices were obtained with the algorithms and code distributed with 
HySime [83]. From those plots, we read that, for SudP5SNR40 data set, SNR-SD(i) > 1 for i < 5 and 

^Available online from: http://speclab.cr.usgs.gov/spectral-lib.html 
^Available online from: http://aviris.jpl.nasa.gov/data/free_data.html 
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SNR-SD(z) < 1 for z > 5, indicating that the SNR is high in the signal subspace. For SusgsP5SNR40, 
the singular values of the mixing matrix decay faster due to the high correlation of the USGS spectral 
signatures. Nevertheless the "big picture" is similar to that of SudP5SNR40 data set. The Rcuprite data 
set yields the more difficult inverse problem because SNR-SD(z) has "close to convex shape" slowly 
approaching the value 1. This is a clear indication of a badly-conditioned inverse problem [84]. 

III. Signal subspace identification 

The number of endmembers present in a given scene is, very often, much smaller than the number 
of bands B. Therefore, assuming that the linear model is a good approximation, spectral vectors lie 
in or very close to a low-dimensional linear subspace. The identification of this subspace enables low- 
dimensional yet accurate representation of spectral vectors, thus yielding gains in computational time and 
complexity, data storage, and SNR. It is usually advantageous and sometimes necessary to operate on 
data represented in the signal subspace. Therefore, a signal subspace identification algorithm is required 
as a first processing step. 

Unsupervised subspace identification has been approached in many ways. Band selection or band 
extraction, as the name suggests, exploits the high correlation existing between adjacent bands to select 
a few spectral components among those with higher SNR [85], [86]. Projection techniques seek for the 
best subspaces to represent data by optimizing objective functions. For example, principal component 
analysis (PCA) [87] minimizes sums of squares; singular value decomposition (SVD) [88] maximizes 
power; projections on the first p eigenvectors of the empirical correlation matrix maximize likelihood, 
if the noise is additive and white and the subspace dimension is known to be p [88]; maximum noise 
fraction (MNF) [89] and noise adjusted principal components (NAPC) [90] minimize the ratio of noise 
power to signal power. NAPC is mathematically equivalent to MNF [90] and can be interpreted as a 
sequence of two principal component transforms: the first applies to the noise and the second applies to 
the transformed data set. MNF is related to SNR-SD introduced in (4). In fact, both metrics are equivalent 
in the case of white noise, i.e, Hyj = where I denotes the identity matrix. However, they differ 
when ^ a^I. 

The optical real-time adaptive spectral identification system (ORASIS) [91] framework, developed by U. 
S. Naval Research Laboratory aiming at real-time data processing, has been used both for dimensionahty 
reduction and endmember extraction. This framework consists of several modules, where the dimension 
reduction is achieved by identifying a subset of exemplar pixels that convey the variability in a scene. 
Each new pixel collected from the scene is compared to each exemplar pixel by using an angle metric. The 
new pixel is added to the exemplar set if it is sufficiently different from each of the existing exemplars. An 
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orthogonal basis is periodically created from the current set of exemplars using a modified Gram-Schmidt 
procedure [92]. 

The identification of the signal subspace is a model order inference problem to which information 
theoretic criteria like the minimum description length (MDL) [93], [94] or the Akaike information criterion 
(AIC) [95] comes to mind. These criteria have in fact been used in hyperspectral applications [96] adopting 
the approach introduced by Wax and Kailath in [97]. In turn, Harsanyi, Farrand, and Chang [98] developed 
a Neyman-Pearson detection theory-based thresholding method (HFC) to determine the number of spectral 
endmembers in hyperspectral data, referred to in [96] as virtual dimensionality (VD). The HFC method 
is based on a detector built on the eigenvalues of the sample correlation and covariance matrices. A 
modified version, termed noise- whitened HFC (NWHFC), includes a noise- whitening step [96]. HySime 
(hyperspectral signal identification by minimum error) [83] adopts a minimum mean squared error based 
approach to infer the signal subspace. The method is eigendecomposition based, unsupervised, and fully- 
automatic {i.e., it does not depend on any tuning parameters). It first estimates the signal and noise 
correlation matrices and then selects the subset of eigenvalues that best represents the signal subspace in 
the least square error sense. 

When the spectral mixing is nonlinear, the low dimensional subspace of the linear case is often replaced 
with a low dimensional manifold, a concept defined in the mathematical subject of topology [99]. A 
variety of local methods exist for estimating manifolds. For example, curvilinear component analysis 
[100], curvilinear distance analysis [101], manifold learning [102]-[107] are non-hnear projections based 
on the preservation of the local topology. Independent component analysis [108], [109], projection pursuit 
[110], [111], and wavelet decomposition [112], [113] have also been considered. 

A. Projection on the signal subspace 

Assume that the signal subspace, denoted by S, has been identified by using one of the above referred 
to methods and let the columns of E = [ei, . . . , e^] be an orthonormal basis for S, where G M^, for 
i = 1, . . . ,p. The coordinates of the orthogonal projection of a spectral vector y G onto S, with 
respect to the basis E, are given by = E-^y G W. Replacing y by the observation model (3), we 
have 

Ys = E^Ma -I- E^w. 

As referred to before, projecting onto a signal subspace can yield large computational, storage, and SNR 
gains. The first two are a direct consequence of the fact that p <^ B in most applications; to briefly 
explain the latter, let us assume that the noise w is zero-mean and has covariance a^I. The mean power 
of the projected noise term E-^w is then E||E^w|p = u^p (lE(-) denotes mean value). The relative 
attenuation of the noise power imphed by the projection is then p/B. 
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Fig. (9) illustrates the advantages of projecting the data sets on the signal subspace. The noise 
and the signal subspace were estimated with HySime [83]. The plot on the left hand side shows a 
noisy and the corresponding projected spectra taken from the simulated data set SusgsPSSNRSC*. The 
subspace dimension was correctly identified. The SNR of the projected data set is 46.6 dB, which is 
16.6 dB ~ (B/p) dB above to that of the noisy data set. The plot on the right hand side shows a noisy 
and the corresponding projected spectra from the Rcuprite data set. The identified subspace dimension 
has dimension 18. The SNR of the projected data set is 47.5 dB, which is 5dB above to that of the noisy 
data set. The colored nature of the additive noise explains the difference {B/p) dB — 5dB ~ 5dB. 

A final word of warning: although the projection of the data set onto the signal subspace often removes 
a large percentage of the noise, it does not improve the conditioning of the HU inverse problem, as this 
projection does not change the values of SNR-SD(i) for the signal subspace eigen-components. 

A possible line of attack to further reduce the noise in the signal subspace is to exploit spectral and 
spatial contextual information. We give a brief illustration in the spatial domain. Fig. 10, on the the 
top left hand side, shows the eigen-image no. 18, i.e., the image obtained from ejys for i = 18, of 
the Rcuprite data set. The basis of the signal subspace were obtained with the HySime algorithm. A 
filtered version using then BM3D [114] is shown on the top right hand side. The denoising algorithm 
is quite effective in this example, as confirmed by the absence of structure in the noise estimate (the 
difference between the noisy and the denoised images) shown in the bottom left hand side image. This 
effectiveness can also be perceived from the scatter plots of the noisy (blue dots) and denoised (green 

''Parameters of the simulated data set SusgsP5SNR30: mixing matrix M sampled from a uniformly distributed random variable 
in the interval [0, 1]; p = 5; n = 5000; fractional abundances distributed uniformly on the 4-unit simplex; SNR = 40dB. 
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Fig. 10. Top left: noisy eigen-image no. 18 of the Rcuprite data set. Top right: denoised no. 18; Bottom left: difference 
between noisy and denoised images. Botton right: scatter plots of the Eigen-image no. 17 and no. 18 of the Rcuprite data set 
(blue dots: noisy data; Green dots: denoised data). 



dots) eigen-images 17 and 18 shown in the bottom right hand side figure. The scatter plot corresponding 
to the denoised image is much more dense, reflecting the lower variance. 

B. Affine set projection 

From now on, we assume that the observed data set has been projected onto the signal subspace and, 
for simplicity of notation, we still represent the projected vectors as in (3), that is 

y = Ma + w, (5) 

where y, w G and M € M^^^. Since the columns of M belong to the signal subspace, the original 
mixing matrix is simply given by the matrix product EM. 
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a) orthogonal projection 



b) perspective projection 



Fig. 11. Projections of the observed data onto an hyperplane: a) Orthogonal projection on an hyperplane (the projected vectors 
suffers a rotation); b) Perspective projection (the scaling y/(y"^u) brings them to the hyperplane defined by y'^u = 1). 

Model (5) is a simplification of reality, as it does not model pixel-to-pixel signature variability. Signature 
variability has been studied and accounted for in a few unmixing algorithms (see, e.g., [1 15]-[1 18]), 
including all statistical algorithms that treat endmembers as distributions. Some of this variability is 
amplitude-based and therefore primarily characterized by spectral shape invariance [38]; i.e., while the 
spectral shapes of the endmembers are fairly consistent, their amplitudes are variable. This implies that 
the endmember signatures are affected by a positive scale factor that varies from pixel to pixel. Hence, 
instead of one matrix of endmember spectra for the entire scene, there is a matrix of endmember spectra 
for each pixel [s{i, l)mi, . . . , s{i,p)uip\ - IVISj for i = 1, . . . , n. In this case, and in the absence of 
noise, the observed spectral vectors are no longer in a simplex defined by a fixed set of endmembers but 
rather in the set 



as illustrated in Fig. 11. Therefore, the coefficients of the endmember spectra sjoc need not sum-to-one, 
although they are still nonnegative. Transformations of the data are required to improve the match of 
the model to reality. If a true mapping from units of radiance to reflectance can be found, then that 
transformation is sufficient. However, estimating that mapping can be difficult problem or impossible. 
Other methods can be applied to to ensure that the sum-to-one constraint is a better model, such as the 
following: 

a) Orthogonal projection: Use PCA to identify the affine set that best represent the observed data 
in the least squares sense and then compute the orthogonal projection of the observed vectors onto 
this set (see [119] for details). This projection is illustrated in Fig. 11. 

b) Perspective projection: This is the so-called dark point fixed transform (DPFT) proposed in [81]. 
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Fig. 12. Left (orthogonal projection): angles between projected and unprojected vectors. Right (perspective projection): scale 
factors between projected and unprojected vectors. 



For a given observed vector y, this projection, illustrated in Fig. 11, amounts to rescale y according 
to y/(y^u), where u is chosen such that y^u > for every y in the data set. The hyperplane 
containing the projected vectors is defined by v^u = 1, for any v G W. 

Notice that the orthogonal projection modifies the direction of the spectral vectors whereas the per- 
spective projection does not. On the other hand, the perspective projection introduces large scale factors, 
which may become negative, for spectral vectors close to being orthogonal to u. Furthermore, vectors 
u with different angles produce non-parallel affine sets and thus different fractional abundances, which 
implies that the choice of u is a critical issue for accurate estimation. 

These effects are illustrated in Fig. 12 for the Rterrain data set^. This is a publicly available hyperspec- 
tral data cube distributed by the Army Geospatial Center, United States Army Corps of Engineers, and 
was collected by the hyperspectral image data collection experiment (HYDICE). Its dimensions are 307 
pixels by 500 lines and 210 spectral bands. The figure on the left hand side plots the angles between the 
unprojected and the orthogonally projected vectors, as a function of the norm of the unprojected vectors. 
The higher angles, of the order of 1 — 7°, occur for vectors of small norm, which usually correspond to 
shadowed areas. The figure on the right hand side plots the norm of the projected vectors as a function 
of the norm of the unprojected vectors. The corresponding scale factors varies between, approximately, 
between 1/3 and 10. 

A possible way of mitigating these projection errors is discarding the problematic projections, which 
are vectors with angles between projected and unprojected vectors larger than a given small threshold, 

^ http ://w w w. age . army, mil/hy percube 
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in the case of the perspective projection, and vectors with very small or negative scale factors y-^u, in 
the case of the orthogonal projection. 

IV. Geometrical based approaches to linear spectral unmixing 

The geometrical-based approaches are categorized into two main categories: Pure Pixel (PP) based and 
Minimum Volume (MV) based. There are a few other approaches that will also be discussed. 

A. Geometrical based approaches: pure pixel based algorithms 

The pure pixel based algorithms still belong to the MV class but assume the presence in the data of 
at least one pure pixel per endmember, meaning that there is at least one spectral vector on each vertex 
of the data simplex. This assumption, though enabling the design of very efficient algorithms from the 
computational point of view, is a strong requisite that may not hold in many datasets. In any case, these 
algorithms find the set of most pure pixels in the data. They have probably been the most often used 
in hnear hyperspectral unmixing apphcations, perhaps because of their Ught computational burden and 
clear conceptual meaning. Representative algorithms of this class are the following: 

• The pixel purity index (PPI) algorithm [120], [121] uses MNF as a preprocessing step to reduce 
dimensionahty and to improve the SNR. PPI projects every spectral vector onto skewers, defined as 
a large set of random vectors. The points corresponding to extremes, for each skewer direction, are 
stored. A cumulative account records the number of times each pixel (i.e., a given spectral vector) 
is found to be an extreme. The pixels with the highest scores are the purest ones. 

• N-FINDR [73] is based on the fact that in spectral dimensions, the volume defined by a simplex 
formed by the purest pixels is larger than any other volume defined by any other combination of 
pixels. This algorithm finds the set of pixels defining the largest volume by inflating a simplex inside 
the data. 

• The iterative error analysis (lEA) algorithm [122] implements a series of linear constrained unmix- 
ings, each time choosing as endmembers those pixels which minimize the remaining error in the 
unmixed image. 

• The vertex component analysis (VCA) algorithm [123] iteratively projects data onto a direction 
orthogonal to the subspace spanned by the endmembers already determined. The new endmember 
signature corresponds to the extreme of the projection. The algorithm iterates until all endmembers 
are exhausted. 

• The simplex growing algorithm (SGA) [124] iteratively grows a simplex by finding the vertices 
corresponding to the maximum volume. 
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• The sequential maximum angle convex cone (SMACC) algorithm [125] is based on a convex cone 
for representing the spectral vectors. The algorithm starts with a single endmember and increases 
incrementally in dimension. A new endmember is identified based on the angle it makes with the 
existing cone. The data vector making the maximum angle with the existing cone is chosen as the 
next endmember to enlarge the endmember set. The algorithm terminates when all of the data vectors 
are within the convex cone, to some tolerance. 

• The alternating volume maximization (AVMAX) [126], inspired by N-FINDR, maximizes, in a cyclic 
fashion, the volume of the simplex defined by the endmembers with respect to only one endmember 
at one time. AVMAX is quite similar to the SC-N-FINDR variation of N-FINDR introduced in 
[127]. 

• The successive volume maximization (SVMAX) [126] is similar to VCA. The main difference 
concerns the way data is projected onto a direction orthogonal the subspace spanned by the endmem- 
bers already determined. VCA considers a random direction in these subspace, whereas SVMAX 
considers the complete subspace. 

• The collaborative convex framework [128] factorizes the data matrix Y into a nonnegative mixing 
matrix M and a sparse and also nonnegative abundance matrix S. The columns of the mixing matrix 
WL are constrained to be columns of the data Y. 

• Lattice Associative Memories (LAM) [129]-[131] model sets of spectra as elements of the lattice 
of partially ordered real-valued vectors. Lattice operations are used to nonlinearly construct LAMS. 
Endmembers are found by constructing so-called min and max LAMs from spectral pixels. These 
LAMs contain maximum and minimum coordinates of spectral pixels (after appropriate additive scal- 
ing) and are candidate endmembers. Endmembers are selected from the LAMS using the notions of 
affine independence and similarity measures such as spectral angle, correlation, mutual information, 
or Chebyschev distance. 

Algorithms AVMAX and SVMAX were derived in [126] under a continuous optimization framework 
inspired by Winter's maximum volume criterium [73], which underlies N-FINDR. Following a rigorous 
approach, Chan et al. not only derived AVMAX and SVMAX, but have also unveiled a number of hnks 
between apparently disparate algorithms such as N-FINDR and VCA. 

B. Geometrical based approaches: Minimum volume based algorithms 

The MV approaches seek a mixing matrix JVI that minimizes the volume of the simplex defined by its 
columns, referred to as conv(]VI), subject to the constraint that conv(]VI) contains the observed spectral 
vectors. The constraint can be soft or hard. The pure pixel constraint is no longer enforced, resulting 
in a much harder nonconvex optimization problem. Fig. 13 further illusttates the concept of simplex of 
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minimum size containing the data. The estimated mixing matrix M = [mi, ffi2, 1113] differs slightly from 
the true mixing matrix because there are not enough data points per facet (necessarily p — I per facet) 
to define the true simplex. 

Let us assume that the data set has been projected onto the signal subspace S, of dimension p, and 
that the vectors uii G MP, for i = 1, . . . ,p, are affinely independent (i.e., uii — mi, for z = 2, . . . ,p , are 
linearly independent). The dimensionality of the simplex conv(M) is therefore p — 1 so the volume of 
conv(M) is zero in R^. To obtain a nonzero volume, the extended simplex Mq = [0,M], containing the 
origin, is usually considered. We recall that the volume of conv(Mo), the convex hull of Mq, is given 
by 



y(Mo) = 



|det(M)| 



p\ 



(7) 



An alternative to (7) consists of shifting the data set to the origin and working in the subspace of 
dimension p — I. In this case, the volume of the simplex is given by 



V{M) 



1 



{p-iy. 



det 



1 

mi 



1 



m„ 



Craig's work [81], published in 1994, put forward the seminal concepts regarding the algorithms of MV 
type. After identifying the subspace and applying projective projection (DPFT), the algorithm iteratively 
changes one facet of the simplex at a time, holding the others fixed, such that the volume 



p\ 



is minimized and all spectral vectors belong to this simplex; i.e., M ^y^ ^ and IpM ^yj = 1 
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(respectively, ANC and ASC constraints^) for i = 1, . . . , n. In a more formal way: 
for t = I, . . . , 

Mo+^ = argminl/(Mo) 

Mo 

s.t: facets(Mo) = facets(M*), except for facet i = {t mod p) 
s.t: Vi ^ 0, iJm~ Vi = 1, for i = 1, . . . , n. 

A 

m. 
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Fig. 14. Noisy data. The dashed simplex represents the simplex of minimum volume required to contain all the data; by 
allowing violations to the positivity constraint, the MVSA and SISAL algorithms yield a simplex very close to the true one. 

The minimum volume simplex analysis (MVSA) [132] and the simplex identification via variable 
splitting and augmented Lagrangian (SISAL) [133] algorithms implement a robust version of the MV 
concept. The robustness is introduced by allowing the positivity constraint to be violated. To grasp the 
relevance of this modification, noisy spectral vectors are depicted in Fig. 14. Due to the presence of 
noise, or any other perturbation source, the spectral vectors may lie outside the true data simplex. The 
application of a MV algorithm would lead to the dashed estimate, which is far from the original. 

In order to estimate endmembers more accurately, MVSA/SISAL allows violations to the positivity 
constraint. Violations are penalized using the hinge function (hinge (x) = if x > and —x if x < 0). 
MVSA/SISAL project the data onto a signal subspace. Thus the representation of section III-B is used. 
Consequently, the matrix M is square and theoretically invertible (ill-conditioning can make it difficult 
to compute the inverse numerically). Furthermore, 

]VI"V = M"^(Mq! + w) = a + ]V["V. (8) 
*The notation Ip stands for a column vector of ones with size p. 
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MVSA/SISAL aims at solving the following optimization problem: 

Q = argmax log(|det(Q)|)-Al^hinge(QY)l„ (9) 
Q 

= argmin log(|det(M)|) + Al^hingefa + M~^w)ln (10) 
S.t: Ip Q = q^, 

where Q = M^^, q„j = 1^ with Yp being any set of of linearly independent spectral vectors taken 
from the data set Y = [yi, . . . ■,yn\, A is a regularization parameter, and n stands for the number of 
spectral vectors. 

We make the following two remarks: a) maximizing log(|det(Q)|) is equivalent to minimizing l/(Mo); 
b) the term — AlJ hinge(QY)l„ weights the ANC violations. As A approaches infinity, the soft constraint 
approaches the hard constraint. MVSA/SISAL optimizes by solving a sequence of convex optimization 
problems using the method of augmented Lagrange multipliers, resulting in a computationally efficient 
algorithm. 

The minimum volume enclosing simplex (MVES) [134] aims at solving the optimization problem 
(10) with A = oo, i.e., for hard positivity constraints. MVES implements a cychc minimization using 
Hnear programs (LPs). Although the optimization problem (10) is nonconvex, it is proved in (10) that 
the existence of pure pixels is a sufficient condition for MVES to identify the true endmembers. 

A robust version of MVES (RMVES) was recently introduced in [135]. RMVES accounts for the noise 
effects in the observations by employing chance constraints, which act as soft constraints on the fractional 
abundances. The chance constraints control the volume of the resulting simplex. Under the Gaussian noise 
assumption, RMVES infers the mixing matrix and the fractional abundances via alternating optimization 
involving quadratic programming solvers. 

The minimum volume transform-nonnegative matrix factorization (MVC-NMF) [136] solves the fol- 
lowing optimization problem applied to the original data set, i.e., without dimensionaUty reduction: 

(M,S) = arg min ]-\\Y - mS\\l + XV^ (M) (11) 

s.t: = M^O, S^O, 1^S = 1^, 

where S = [ai, . . . , OLn\ G W^^ is a matrix containing the fractional abundances ||A|||, = tr(A^A) is 
the Frobenius norm of matrix A and A is a regularization parameter. The optimization (11) minimizes 
a two term objective function, where the term (1/2)||Y — MS|||n measures the approximation error and 
the term y^(M) measures the square of the volume of the simplex defined by the columns of M. The 
regularization parameter A controls the tradeoff between the reconstruction errors and simplex volumes. 
MVC-NMF implements a sequence of alternate minimizations with respect to S (quadratic programming 
problem) and with respect to M (nonconvex programming problem). The major difference between 
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MVC-NMF and MVSA/SISAL/RMVES algorithms is that the latter allows violations of the ANC, thus 
bringing robustness to the SU inverse problem, whereas the former does not. 
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Fig. 15. Unmixing results of N-FINDR, VCA, MVC-NMF, and SISAL on different data sets: SusgsP5PPSNR30 - pure-pixel 
(top-left); SusgsP5SNR30 - non pure pixel (top right); SusgsP5MP08SNR30 - truncated fractional abundances (bottom left); 
SusgsP5XS10SNR30 - and highly mixed (bottom tight). 

The iterative constrained endmembers (ICE) algorithm [137] aims at solving an optimization problem 
similar to that of MVC-NMF, where the volume of the simplex is replaced by a much more manageable 
approximation: the sum of squared distances between all the simplex vertices. This volume regularizer is 
quadratic and well defined in any ambient dimension and in degenerated simplexes. These are relevant 
advantages over the |det(M)| regularizer, which is non-convex and prone to complications when the HU 
problem is badly conditioned or if the number of endmembers is not exactly known. Variations of these 
ideas have recently been proposed in [138], [139], [140], [141]. ICE implements a sequence of alternate 
minimizations with respect to S and with respect to M. An advantage of ICE over MVC-NMF, resulting 



25 



from the use of a quadratic volume regularizer, is that in the former one minimization is a quadratic 
programming problem while the other is a least squares problem that can be solved analytically, whereas 
in the MVC-NMF the optimization with respect to M is a nonconvex problem. The sparsity-promoting 
ICE (SPICE) [142] is an extension of the ICE algorithm that incorporates sparsity-promoting priors 
aiming at finding the correct number of endmembers. Linear terms are added to the quadratic objective 
function, one for all the proportions associated with one endmember. The linear term corresponds to an 
exponential prior. A large number of endmembers are used in the initialization. The prior tends to push 
all the proportions associated with particular endmembers to zero. If all the proportions corresponding to 
an endmember go to zero, then that endmember can be discarded. The addition of the sparsity promoting 
prior does not incur additional complexity to the model as the minimization still involves a quadratic 
program. 

The quadratic volume regularizer used in the ICE and SPICE algorithms also provides robustness in 
the sense of allowing data points to be outside of the simplex conv(M). It has been shown that the ICE 
objective function can be written in the following way: 

X(M,S) = ^^||Y-MS||| + //trace (Sm) (12) 
= ^l|Y-MS||| + /.f:ag 

b=l 



where Em is the sample covariance matrix of the endmembers and /j, G [0, 1] is a regularization parameter 
that controls the tradeoff between error and smaller simplexes. If ^ = I, then the best solution is to shrink 
all the endmembers to a single point, so all the data will be outside of the simplex. If /v, = 0, then the best 
solution is one that yields no error, regardless of the size of the simplex. The solution can be sensitive 
to the choice of /x. The SPICE algorithm has the same properties. Li versions also exist [143]. 

It is worth noting the both Heylen et al. [144] and Silvn-Crdenas [145] have reported geometric-based 
methods that can either search for or analytically solve for the fully constrained least squares solution. 

The Li/2-NMF method introduced in [146] formulates a nonnegative matrix factorization problem 
similar to (11), where the volume regularizer is replaced with the sparsity-enforcing regularizer ||S||i/2 = 
Sf=i Z]j=i By promoting zero or small abundance fractions, this regularizer pulls endmember 

facets towards the data cloud having an effect similar to the volume regularizer. The estimates of the 
endmembers and of the fractional abundances are obtained by a modification of the multiplicative update 
rules introduced in [147]. 

Convex cone analysis (CCA) [148], finds the boundary points of the data convex cone (it does not 
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apply affine projection), what is very close to MV concept. CCA starts by selecting the eigenvectors 
corresponding to the largest eigenvalues. These eigenvectors are then used as a basis to form hnear 
combinations that have only nonnegative elements, thus belonging to a convex cone. The vertices of the 
convex cone correspond to spectral vectors contains as many zero elements as the number of eigenvectors 
minus one. 

Geometric methods can be extended to piecewise hnear mixing models. Imagine the following scenario: 
An airborne hyperspectral imaging sensor acquires data over an area. Part of the area consists of farmland 
containing alternating rows of two types of crops (crop A and crop B) separated by soil whereas the other 
part consists of a village with paved roads, buildings (all with the same types of roofs), and non-deciduous 
trees. Spectra measured from farmland are almost all linear mixtures of endmember spectra associated 
with crop A, crop B, and soil. Spectra over the village are almost all linear mixtures of endmember 
spectra associated with pavement, roofs, and non-deciduous trees. Some pixels from the boundary of the 
village and farmland may be mixtures of all six endmember spectra. The set of all pixels from the image 
will then consist of two simplexes. Linear unmixing may find some, perhaps all, of the endmembers. 
However, the model does not accurately represent the true state of nature. There are two convex regions 
and the vertices (endmembers) from one of the convex regions may be in the interior of the convex hull 
of the set of all pixels. In that case, an algorithm designed to find extremal points on or outside the 
convex hull of the data will not find those endmembers (unless it fails to do what it was designed to do, 
which can happen). Relying on an algorithm failing to do what it is designed to do is not a desirable 
strategy. Thus, there is a need to devise methods for identifying multiple simplexes in hyperspectral data. 
One can refer to this class of algorithms as piecewise convex or piecewise linear unmixing. 

One approach to designing such algorithms is to represent the convex regions as clusters. This approach 
has been taken in [149]-[153]. The latter methods are Bayesian and will therefore be discussed in the next 
section. The first two rely on algorithms derived from fuzzy and possibihstic clustering. Crisp clustering 
algorithms (such as k-means) assign every data point to one and only one cluster. Fuzzy clustering 
algorithms allow every data point to be assigned to every cluster to some degree. Fuzzy clusters are 
defined by these assignments, referred to as membership functions. In the example above, there should 
be two clusters. Most points should be assigned to one of the two clusters with high degree. Points on 
the boundary, however, should be assigned to both clusters. 

Assuming that there are C simplexes in the data, then the following objective function can be used to 
attempt to find endmember spectra and abundances for each simplex: 

C ( N P-I P \ 

J = '^{'^u'fn\\yn-'M.iain\\l + ^ INiA; - ||i j (13) 
i=l \n=l k=lj=k+l J 
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such that 

"ifen > V/c = l,...,p; Z^fc=i "ifen = 1 
Uik>0 yk = l,...,p; Y7k=i'^ik = 

Here, Uin represents the membership of the n*'* data point in the i*'* simplex. The other terms are 
very similar to those used in the ICE/SPICE algorithms except that there are C endmember matrices and 
NC abundance vectors. Analytic update formulas can be derived for the memberships, the endmember 
updates, and the Lagrange multipliers. An update formula can be used to update the fractional abundances 
but they are sometimes negative and are then clipped at the boundary of the feasible region. One can 
still use quadratic programming to solve for them. As is the case for almost all clustering algorithms, 
there are local minima. However, the algorithm using all update formulas is computationally efficient. A 
robust version also exists that uses a combination of fuzzy and possibilistic clustering [151]. 

Fig. 15 shows results of pure pixel based algorithms (N-FINDR and VCA) and MV based algorithms 
(MVC-NMF and SISAL) in simulated data sets representative of the classes of problems illustrated 
in Fig. 6. These data sets have n = 5000 pixels and SNR = 30 dB and the following characteris- 
tics: SusgsP5PPSNR30 - pure pixels and abundances uniformly distributed over the simplex (top left); 
SusgsP5SNR30 non pure pixels and abundances uniformly distributed over the simplex (top right); 
SusgsP5MP08SNR30 abundances uniformly distributed over the simplex but truncated to 0.8 (bottom 
left); SusgsP5XS10SNR30 abundances with Dirichlet distributed with concentration parameter set to 10, 
thus yielding a highly mixed data set. 

In the top left data set all algorithm produced very good results because pure pixels are present. In 
the top right SISAL and MVC-NMF produce good results but VCA and N-FINDR shows a degradation 
in performance because there are no pure pixels. In the bottom left SISAL and MVC-NMF still produce 
good results but VCA and N-FINDR show a significant degradation in performance because the pixels 
close to the vertices were removed. Finally, in the bottom right all algorithm produce unacceptable results 
because there are no pixels in the vertex of the simplex neither on its facets. These data sets are beyond 
the reach of geometrical based algorithms. 

V. Statistical METHODS 

When the spectral mixtures are highly mixed, the geometrical based methods yields poor results 
because there are not enough spectral vectors in the simplex facets. In these cases, the statistical methods 
are a powerful alternative, which, usually, comes with price: higher computational complexity, when 
compared with the geometrical based approaches. Statistical methods also provide a natural framework for 
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representing variability in endmembers. Under the statistical framework, spectral unmixing is formulated 
as a statistical inference problem. 

Since, in most cases, the number of substances and their reflectances are not known, hyperspectral 
unmixing falls into the class of bhnd source separation problems [154]. Independent Component Analysis 
(ICA), a well-known tool in blind source separation, has been proposed as a tool to blindly unmix 
hyperspectral data [155]-[157]. Unfortunately, ICA is based on the assumption of mutually independent 
sources (abundance fractions), which is not the case of hyperspectral data, since the sum of abundance 
fractions is constant, implying statistical dependence among them. This dependence compromises ICA 
applicability to hyperspectral data as shown in [39], [158]. In fact, ICA finds the endmember signatures 
by multiplying the spectral vectors with an unmixing matrix which minimizes the mutual information 
among channels. If sources are independent, ICA provides the correct unmixing, since the minimum 
of the mutual information corresponds to and only to independent sources. This is no longer true for 
dependent fractional abundances. Nevertheless, some endmembers may be approximately unmixed. These 
aspects are addressed in [158]. 

Bayesian approaches have the ability to model statistical variability and to impose priors that can 
constrain solutions to physically meaningful ranges and regularize solutions. The latter property is 
generally considered to be a requirement for solving ill-posed problems. Adopting a Bayesian framework, 
the inference engine is the posterior density of the random quantities to be estimated. When the unknown 
mixing matrix M. and the abundance fraction matrix S are assumed to be a priori independent, the Bayes 
paradigm allows the joint posterior of M. and S to be computed as 

Pm,s\y{M, S| Y) = py|M,5(Y|M, S)pm{M)ps{S)/py{Y), (14) 

where the notation pA and pa\b stands for the probability density function (pdf) of A and of A given 
B, respectively. In (14), py\m.sO^\^-, S) is the likelihood function depending on the observation model 
and the prior distribution pm(M) and ps{S) summarize the prior knowledge regarding these unknown 
parameters. 

A popular Bayesian estimator is [159] the joint maximum a posteriori (MAP) estimator given by 

(M,S)map = argmaxpM,5|Y(M,S|Y) (15) 
= argmin-logpy|M,5(Y|]VI,S) - logpM(M) - logps(S). (16) 

Under the linear mixing model and assuming the noise random vector w is Gaussian with covariance 
matrix a% then, we have - logpy|M,5(Y|]V[, S) = {l/{2a^))\\Y - MSWjr + const. It is then clear that 
ICE/SPICE [142] and MVC-NMF [136] algorithms, which have been classified as geometrical, can also 
be classified as statistical, yielding joint MAP estimates in (15). In all these algorithms, the estimates are 
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obtained by minirnizing a two-term objective function: — logpy|jvf 5(Y|M, S) plays the role of a data 



fitting criterion and — logpM{^) — logp5(S) consists of a penalization. Conversely, from a Bayesian 
perspective, assigning prior distributions and ps{S) to the endmember and abundance matrices 

M and A, respectively, is a convenient way to ensure physical constraints inherent to the observation 
model. 

The work [160] introduces a Bayesian approach where the linear mixing model with zero-mean white 
Gaussian noise of covariance cr^I is assumed, the fractional abundances are uniformly distributed on the 
simplex, and the prior on M is an autoregressive model. Maximization of the negative log-posterior 
distribution is then conducted in an iterative scheme. Maximization with respect to the abundance 
coefficients is formulated as n weighted least square problems with linear constraints that are solved 
separately. Optimization with respect to M is conducted using a gradient-based descent. 

The Bayesian approaches introduced in [161]-[164] have all the same flavor. The posterior distribution 
of the parameters of interest is computed from the linear mixing model within a hierarchical Bayesian 
model, where conjugate prior distributions are chosen for some unknown parameters to account for 
physical constraints. The hyperparameters involved in the definition of the parameter priors are then 
assigned non-informative priors and are jointly estimated from the full posterior of the parameters and 
hyperparameters. Due to the complexity of the resulting joint posterior, deriving closed-form expressions 
of the MAP estimates or designing an optimization scheme to approximate them remain impossible. 
As an alternative, Markov chain Monte Carlo algorithms are proposed to generate samples that are 
asymptotically distributed according to the target posterior distribution. These samples are then used to 
approximate the minimum mean square error (MMSE) (or posterior mean) estimators of the unknown 
parameters 



These algorithms mainly differ by the choice of the priors assigned to the unknown parameters. More 
precisely, in [161], [165], spectral unmixing is conducted for spectrochemical analysis. Because of the 
sparse nature of the chemical spectral components, independent Gamma distributions are elected as priors 

for the spectra. The mixing coefficients are assumed to be non-negative without any sum-to-one constraint. 
Interest of including this additivity constraint for this specific application is investigated in [162] where 
uniform distributions over the admissible simplex are assigned as priors for the abundance vectors. Note 
that efficient implementations of both algorithms for operational applications are presented in [166] and 
[167], respectively. 

In [163], instead of estimating the endmember spectra in the full hyperspectral space, Dobigeon et 




(17) 



(18) 
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al. propose to estimate their projections onto an appropriate lower dimensional subspace that has been 
previously identified by one of the dimension reduction technique described in paragraph III-A. The main 
advantage of this approach is to reduce the number of degrees of freedom of the model parameters relative 
to other approaches, e.g., [161], [162], [165]. Accuracy and performance of this Bayesian unmixing 
algorithm when compared to standard geometrical based approaches is depicted in Fig. 16 where a 
synthetic toy example has been considered. This example is particularly illustrative since it is composed 
of a small dataset where the pure pixel assumption is not fulfilled. Consequently, the geometrical based 
approaches that attempt to maximize the simplex volume (e.g., VCA and N-FINDR) fail to recover the 
endmembers correctly, contrary to the statistical algorithm that does not require such hypothesis. 




-2.5 -2 -1.5 -1 -0.5 0.5 1 1.5 2 
Band #1 



Fig. 16. Projected pixels (black points), actual endmembers (black circles), endmembers estimated by N-FINDR (blue stars), 
endmembers estimated by VCA (green stars) and endmembers estimated by the algorithm in [163] (red stars 

Note that in [162], [163] and [164] independent uniform distributions over the admissible simplex are 
chosen as prior distributions for the abundance vectors. This assumption, which is equivalent of choosing 
Dirichlet distributions with all hyperparameters equal to 1, could seem to be very weak. However, as 
demonstrated in [163], this choice favors estimated endmembers that span a simplex of minimum volume, 
which is precisely the founding characteristics of some geometrical based unmixing approaches detailed 
in paragraph IV-B. 

Explicitly constraining the volume of the simplex formed by the estimated endmembers has also been 
considered in [164]. According to the optimization perspective suggested above, penalizing the volume 
of the recovered simplex can be conducted by choosing an appropriate negative log-prior — logpjv/(M). 
Arngren et al. have investigated three measures of this volume: exact simplex volume, distance between 
vertices, volume of a corresponding parallelepiped. The resulting techniques can thus be considered as 
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stochastic implementations of the MVC-NMF algorithm [136]. 

All the Bayesian unmixing algorithms introduced above rely on the assumption of an independent and 
identically Gaussian distributed noise, leading to a covariance matrix cr^I of the noise vector w. Note 
that the case of a colored Gaussian noise with unknown covariance matrix has been handled in [168]. 
However, in many applications, the additive noise term may neglected because the noise power is very 
small. When that is not the case but the signal subspace has much lower dimension than the number of 
bands, then, as seen in Section rH-A, the projection onto the signal subspace largely reduces the noise 
power Under this circumstance, and assuming that M G is invertible and the observed spectral 
vectors are independent, then we can write 



where p„ is the fractional abundance pdf, and compute the em maximum likelihood (ML) estimate of 
W = M^^. This is precisely the ICA line of attack, under the assumption that the fractional abundances 



tions [158] has promoted research on suitable statistical models for hyperspectral fractional abundances 
and in effective algorithms to infer the mixing matrices. This is the case with DECA [169], [170]; 
the abundance fractions are modeled as mixtures of Dirichlet densities, thus, automatically enforcing the 
constraints on abundance fractions imposed by the acquisition process, namely nonnegativity and constant 
sum. A cyclic minimization algorithm is developed where: 1) the number of Dirichlet modes is inferred 
based on the minimum description length (MDL) principle; 2) a generalized expectation maximization 
(GEM) algorithm is derived to infer the model parameters; 3) a sequence of augmented Lagrangian based 
optimizations are used to compute the signatures of the endmembers. 

Piecewise convex unmixing, mentioned in the geometrical approaches section, has also been investi- 
gated using a Bayesian approach' In [171] the normal compositional model is used to represent each 
convex set as a set of samples from a collection of random variables. The endmembers are represented as 
Gaussians. Abundance multinomials are represented by Dirichlet distributions. To form a Bayesian model, 
priors are used for the parameters of the distributions. Thus, the data generation model consists of two 
stages. In the first stage, endmembers are sampled from their respective Gaussians. In the second stage, 
for each pixel, an abundance multinomial is sampled from a Dirichlet distribution. Since the number of 
convex sets is unknown, the Dirichlet process mixture model is used to identify the number of clusters 
while simultaneously learning the parameters of the endmember and abundance distributions. This model 




are independent, i.e., pa = YVk=iPak- The fact that this assumption is not valid in hyperspectral applica- 



^It is an interesting to remark that by taking the negative of the logarithm of a fuzzy clustering objective function, such as 
in Eq. 13, one can represent a fuzzy clustering objective as a Bayesian MAP objective. One interesting difference is that the 
precisions on the likelihood functions are the memberships and are data point dependent. 
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Fig. 17. Left: Scatterplot of the SusgsP3SNRinfXSmix dataset jointly with the true and estimated endmembers. Right: 
Scatterplot of a Cuprite data subset jointly with the projections of Montmorillonite, Desert Varnish, and Alunite, witch are 
Imown to dominate this subset, and estimated endmembers. 



is very general and can represent very complex data sets. The Dirichlet process uses a Metropolis-within- 
Gibbs method to estimate the parameters, which can be quite time consuming. The advantage is that the 
sampler will converge to the joint distribution of the parameters, which means that one can select the 
maximum a-posterior estimates from the estimated joint distributions. Although Gibbs samplers seem 
inherently sequential, some surprising new theoretical results by [172] show that theoretically correct 
sampling samplers can be implemented in parallel, which offers the promise of dramatic speed-ups of 
algorithms such as this and other probabilistic algorithms mentioned here that rely on sampling. 

Fig. 17, left, presents a scatterplot of the simulated data set SusgsP3SNRinfXSmix and the endmember 
estimates produced by VCA, MVES, MVSA, MVC-NMF, SISAL, and DECA algorithms. This data set 
is generated with a mixing matrix M sampled from the USGS library and with p = 3 endmembers, 
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Fig. 18. AVIRIS subset and of 30 (wavelength A = 667. 3nm) used to compute the results plotted in Fig. 17, right. 

n = 10000 spectral vectors, and fractional abundances given by mixtures of two Dirichlet modes with 
parameters [6, 25, 9] and [7, 8, 23] and mode weights of 0.67 and 0.33, respectively. DECA Dirichlet 
parameters were randomly initialized and the mixing probabilities uniformly. This setting reflects a 
situation in which no knowledge of the size and the number of regions in the scene exists. The parameters 
of the remaining methods were hand tuned for optimal performance*. See [170] for more details. The 
considered data set corresponds to a highly mixed scenario, where the geometrical based algorithms 
performs poorly, as explained in Section IV. On the contrary, DECA yields useful estimates. 

Fig. 17, right, is similar the one in the left hand side for a Cuprite data subset of size 50 x 90 
pixels shown in Fig. 18. This subset is dominated by Montmorillonite, Desert Varnish, and Alunite, 
which are known to dominate the considered subset image [6]. The projections of this endmembers 
are represented by black circles. DECA identified k = 5 modes, with parameters 6i = [1.5, 4.1, 2.9], 
02 = [23.4, 51.3, 15.5], 63 = [27.2, 26.6, 4.3], 9^ = [17.5, 3.6, 2.5], and 65 = [10.3, 8.0, 7.3], and 
mode weights ei = 0.04, €2 = 0.69, es = 0.07, 64 = 0.10, and 65 = 0.10. These parameters correspond 
to a highly non-uniform distribution over the simplex as could be inferred from the scatterplot. Although 
the estimation results are more difficult to judge in the case of real data than in the case on simulated 
data, as we not really sure about the true endmembers, it is reasonable to conclude that the statistical 
approach is producing similar to or better estimates than the geometrical based algorithms. 

The examples shown Fig. 17 illustrates the potential and flexibility of the Bayesian methodology. As 
already referred to above, these advantages come at a price: computational complexity linked to the 

^MVC-MNF regularization parameter: r = 1"*; MVES tolerance: 10"''; SISAL regularization parameter A = 10; SPICE 
regularization parameter fi = 10~^; SPICE sparsity parameter F = 0.5; SPICE stopping parameter 10~®. 
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posterior computation and to the inference of the estimates. 

VI. Sparse regression based unmixing 

The spectral unmixing problem has recently been approached in a semi- supervised fashion, by assuming 
that the observed image signatures can be expressed in the form of hnear combinations of a number of 
pure spectral signatures known in advance [173]-[175] (e.g., spectra collected on the ground by a field 
spectro-radiometer). Unmixing then amounts to finding the optimal subset of signatures in a (potentially 
very large) spectral library that can best model each mixed pixel in the scene. In practice, this is a 
combinatorial problem which calls for efficient linear sparse regression techniques based on sparsity- 
inducing regularizers, since the number of endmembers participating in a mixed pixel is usually very 
small compared with the (ever-growing) dimensionality and availability of spectral libraries [1]. Linear 
sparse regression is an area of very active research with strong links to compressed sensing [79], [176], 
[177], least angle regression [178], basis pursuit, basis pursuit denoising [179], and matching pursuit 
[180], [181]. 

Let us assume then that the spectral endmembers used to solve the mixture problem are no longer 
extracted nor generated using the original hyperspectral data as input, but instead selected from a library 
A G M^^"^ containing a large number of spectral samples, say m, available a priori. In this case, 
unmixing amounts to finding the optimal subset of samples in the library that can best model each mixed 
pixel in the scene. Usually, we have m> B and therefore the linear problem at hands is underdetermined. 
Let X G denote the fractional abundance vector with regards to the Ubrary A. With these definitions 
in place, we can now write our sparse regression problem as 

min||x||o subject to 11 Ax — ylU < (5, x ^ 0, (19) 

X 

where ||x||o denotes the number of non-zero components of x and 5 > is the error tolerance due to 
noise and modeling errors. Assume for a while that (5 = 0. If the system of hnear equations Ax = y has 
a solution satisfying 2 ||x||o < spark(A), where spark(A) < rankA-|- 1 is the smallest number of linearly 
dependent columns of A, it is necessarily the unique solution of (19) [182]. For (5 > 0, the concept of 
uniqueness of the sparsest solution is replaced with the concept of stability [176]. 

In most HU applications, we do have 2 ||x||o ^ spark(A) and therefore, at least in noiseless scenarios, 
the solutions of (19) are unique. However, problem (19) is NP-hard [183] and therefore there is no hope 
in solving it in a straightforward way. Greedy algorithms such as the orthogonal matching pursuit (OMP) 
[181] and convex approximations replacing the the ^Q norm with the norm, termed basis pursuit (BP), 
if (5 = 0, and basis pursuit denoising (BPDN ) [179], if (5 > 0, are alternative approaches to compute the 
sparsest solution. If we add the ANC to BP and BPDN problems, we have the constrained basis pursuit 
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(CBP) and the constrained basis pursuit denoising (CBPDN) problems [184], respectively. The CBPDN 
optimization problem is thus 

min||x||i subject to 11 Ax — ylh < (5, x ^ 0. (20) 

X 

An equivalent formulation to (20), termed constrained sparse regression (CSR) (see [184]), is 

min (1/2)11 Ax -y||2 + A||x||i subject to x ^ 0, (21) 

X 

where A > is related with the Lagrange multiplier of the inequality ||Ax — y||2 < 6, also interpretable 
as a regularization parameter. 

Contrary to the problem (19), problems (20) and (21) are convex and can be solved efficiently 
[184], [185]. What is, perhaps, totally unexpected is that sparse vector of fractional abundances can 
be reconstructed by solving (20) or (21) provided that the columns of matrix A are incoherent in a 
given sense [186]. The applicability of sparse regression to HU was studied in detail in [173]. Two main 
conclusions were drawn: 

a) hyperspectral signatures tend to be highly correlated what imposes limits to the quality of the 
results provided by solving CBPDN or CSR optimization problems. 

b) The hmitation imposed by the highly correlation of the spectral signatures is mitigated by the 
high level of sparsity most often observed in the hyperspectral mixtures. 

At this point, we make a brief comment about the role of ASC in the context of CBPDN and of CSR 
problems. Notice that if x belongs to the unit simplex (i.e., Xj > for z = 1, . . . , m, and YlT^i = 
we have ||x||i = 1. Therefore, if we add the sum-to-one constraint to (20) and (21), the corresponding 
optimization problems do not depend on the ii norm ||x||i. In this case, the optimization (21) is converted 
into the well known fully constrained least squares (FCLS) problem and (20) into a feasibility problem, 
which for (5 = is 

solution {Ax = y} subject to x ^ 0. (22) 

The uniqueness of sparse solutions to (22) when the system is underdetermined is addressed in [187]. 
The main finding is that for matrices A with a row- span intersecting the positive orthant (this is the case 
of hyperspectral libraries), if this problem admits a sufficiently sparse solution, it is necessarily unique. 

It is remarkable how the ANC alone acts as a sparsity-inducing regularizer. 

In practice, and for the reasons pointed Section 111-B, the ASC is rarely satisfied. For this reason, and 
also due to the presence of noise and model mismatches, we have observed that the CBPDN and CSR 
often yields better unmixing results than CLS and FCLS. 

In order to illustrate the potential of the sparse regression methods, we run an experiment with simulated 
data. The hyperspectral library A, of size L = 224 and m = 342, is a pruned version of the USGS hbrary 
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Fig. 19. Sparse reconstruction results for a simulated data set generated from the USGS library. Top: Signal to reconstruction 
error (SRE) as a function of the number of active materials. Bottom: Number of incorrect selected material as a function of the 
number of active materials. 



in which the angle between any two spectral signatures is no less than 0.05 rad (approximately 3 degrees). 
The abundance fractions follows a Dirichlet distribution with constant parameter of value 2, yielding a 
mixed data set beyond the reach of geometrical algorithms. In order to put in evidence the impact of 
the angles between the library vectors, and therefore the mutual coherence of the library [187], in the 
unmixing results, we organize the library into two subsets; the minimum angle between any two spectral 
signatures is higher the 7° degrees in the first set and lower than 4° in the second set. 

Fig. 19 top, plots unmixing results obtained by solving the CSR problem (21) with the SUNSAL 
algorithm introduced in [184]. The regularization parameter A was hand tuned for optimal performance. 
For each value of the abscissa ||x||o, representing the number of active columns of A, we select ||x||o 
elements of one of the subsets above referred to and generate n = 1000 Dirichlet distributed mixtures. 
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From the sparse regression results, we estimate the signal-to-reconstruction error (SRE) as 

SRE(dB).101og„(j5ia^). 

where x and (•) stand for estimated abundance fraction vector and sample average, respectively. 

The curves on the top left hand side were obtained with the noise set to zero. As expected there is a 
degradation of performance as ||x||o increases and Om decreases. Anyway, the obtained values of SRE 
correspond to an almost perfect reconstruction for 9m{-^) > 7°. For 6m{A.) < 3° the reconstruction is 
almost perfect for ||x||o < 20, as well, and of good quahty for most unmixing purposes for for ||x||o > 20. 

The curves on the top right hand side were obtained with SNR= 25 dB. This scenario is much more 
challenging than the previous one. Anyway, even for Omin < 4°, we get SRE> 10 dB for, ||x||o < 5, 
corresponding to a useful performance in HU applications. Notice that best values of SRE for 6min < 4° 
are obtained with A = 5 x 10~^, putting in evidence the regularization effect of the £i norm in the CSR 
problem (21), namely when the spectral are strongly coherent. 

The curves on the bottom plot the number of incorrect selected materials for SNR= 25 dB. This 
number is zero for SNR= oo. For each value of ||x||o, we compare the ||x||o larger elements of x 
with the true ones and count the number of mismatches. We conclude that a suitable setting of the 
regularization parameter yields a correct selection of the materials for ||x||o < 8. 



The success of hyperspectral sparse regression relies crucially on the availability of suitable hyper- 
spectral libraries. The acquisition of these libraries is often a time consuming and expensive procedure. 
Furthermore, because the libraries are hardly acquired under the same conditions of the data sets under 
consideration, a delicate calibration procedure have to be carried out to adapt either the library to the data 
set or vice versa [173]. A way to sidestep these difficulties is the learning of the libraries directly from the 
dataset with no other a priori information involved. For the application of these ideas, frequetly termed 
dictionary learning, in signal and image processing see, e.g.,, [188], [189] and references therein). Charles 
et al. have recently applied this hne of attack to sparse HU in [190]. They have modified an existing 
unsupervised learning algorithm to learn an optimal library under the sparse representation moldel. Using 
this learned library they have shown that the sparse representation model learns spectral signatures of 
materials in the scene and locally approximates nonlinear manifolds for individual materials. 

VII. Spatial-spectral contextual information 

Most of the unmixing strategies presented in the previous paragraphs are based on a objective cri- 
terion generally defined in the hyperspectral space. When formulated as an optimization problem (e.g., 
implemented by the geometrical-based algorithms detailed in Section IV), spectral unmixing usually 
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relies on algebraic constraints that are inherent to the observation space M^: positivity, additivity and 
minimum volume. Similarly, the statistical- and sparsity-based algorithms of Sections V and VI exploit 
similar geometric constraints to penalize a standard data-fitting term (expressed as a hkehhood function 
or quadratic error term). As a direct consequence, all these algorithms ignore any additional contextual 
information that could improve the unmixing process. However, such valuable information can be of 
great benefit for analyzing hyperspectral data. Indeed, as a prototypal task, thematic classification of 
hyperspectral images has recently motivated the development of a new class of algorithms that exploit 
both the spatial and spectral features contained in image. Pixels are no longer processed individually but 
the intrinsic 3D nature of the hyperspectral data cube is capitalized by taking advantage of the correlations 
between spatial and spectral neighbors (see, e.g. [191]-[198]). 

Following this idea, some unmixing methods have targeted the integration of contextual information to 
guide the endmember extraction and/or the abundance estimation steps. In particular, the Bayesian esti- 
mation setting introduced in Section V provides a relevant framework for exploiting spatial information. 
Anecdotally, one of the earliest work dealing with linear unmixing of multi-band images (casted as a soft 
classification problem) explicitly attempts to highlight spatial correlations between neighboring pixels. 
In [199], abundance dependencies are modeled using Gaussian Markov random fields, which makes this 
approach particularly well adapted to unmix images with smooth abundance transition throughout the 
observed scene. 

In a similar fashion, Eches et al. have proposed to exploit the pixel correlations by using an underlying 
membership process. The image is partitioned into regions where the statistical properties of the abundance 
coefficients are homogeneous [200]. A Potts-Markov random field has been assigned to hidden labeling 
variables to model spatial dependencies between pixels within any region. It is worthy to note that, 
conditionally upon a given class, unmixing is performed on each pixel individually and thus generahzes 
the Bayesian algorithms of [201]. In [200], the number of homogeneous regions that compose the image 
must be chosen and fixed a priori. An extension to a fully unsupervised method, based on nonparametric 
hidden Markov models, have been suggested by Mittelman et al. in [202]. 

Several attempts to exploit spatial information have been also made when designing appropriate criteria 
to be optimized. In addition to the classical positivity, full additivity and minimum volume constraints, 
other penalizing terms can be included in the objective function to take advantage of the spatial structures 
in the image. In [203], the spatial autocorrelation of each abundance is described by a measure of 
spatial complexity, promoting these fractions to vary smoothly from one pixel to its neighbors (as in 
[199]). Similarly, in [204], spatial information has been incorporated within the criterion by including 
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a regularization term that takes into account a weighted combination of the abundances related to the 
neighboring pixels. Other optimization algorithms operate following the same strategy (see for examples 
[205]-[207]). 

Extended morphological operations have been also used as a baseline to develop an automatic mor- 
phological endmember extraction (AMEE) algorithm [208] for spatial-spectral endmember extraction. 
Spatial averaging of spectrally similar endmember candidates found via singular value decomposition 
(SVD) was used in the spatial spectral endmember extraction (SSEE) algorithm [209]. Recently, a spatial 
preprocessing (SPP) algorithm [210] has been proposed which estimates, for each pixel vector in the scene, 
a spatially-derived factor that is used to weight the importance of the spectral information associated to 
each pixel in terms of its spatial context. The SPP is intended as a preprocessing module that can be 
used in combination with an existing spectral-based endmember extraction algorithm. 

Finally, we mention very recent research directions aiming at exploiting contextual information under 
the sparse regression framework. Work [185] assumes that the endmembers are known and formulates 
a deconvolution problem, where a Total Variation regularizer [211] is apphed to the spatial bands to 
enhance their resolution. Work [212] formulates the HU problem as nonconvex optimization problem 
similar to the nonnegative matrix factorization (11), where the volume regularization term is replaced 
with an ii regularizer applied to differences between neighboring vectors of abundance fractions. The 
hmitation imposed to the sparse regression methods by the usual high correlation of the hyperspectral 
signatures is mitigated in [213], [214] by adding the Total Variation [211] regularization term, applied to 
the individual bands, to CSR problem (21). A related approach is followed in [215]; here a collaborative 
regularization term [216] is added to CSR problem (21) to enforce the same set of active materials in 
all pixels of the data set. 

VIII. Summary 

More than one decade after Keshava and Mustard's tutorial paper on spectral unmixing pubhshed in the 
IEEE Signal Processing Magazine [1], effective spectral unmixing still remains an elusive exploitation 
goal and a very active research topic in the remote sensing community. Regardless of the available 
spatial resolution of remotely sensed data sets, the spectral signals collected in natural environments 
are invariably a mixture of the signatures of the various materials found within the spatial extent 
of the ground instantaneous field view of the remote sensing imaging instrument. The availability of 
hyperspectral imaging instruments with increasing spectral resolution (exceeding the number of spectral 
mixture components) has fostered many developments in recent years. In order to present the state-of-the- 
art and the most recent developments in this area, this paper provides an overview of recent developments 
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in hyperspectral unmixing. Several main aspects are covered, including mixing models (linear versus 
nonlinear), signal subspace identification, geometrical-based spectral unmixing, statistical-based spectral 
unmixing, sparse regression-based unmixing and the integration of spatial and spectral information for 
unmixing purposes. In each topic, we describe the physical or mathematical problems involved and many 
widely used algorithms to address these problems. Because of the high level of activity and Umited space, 
there are many methods that have not been addressed directly in this manuscript. However, combined, 
the topics mentioned here provide a snapshot of the state-of-the-art in the area of spectral unmixing, 
offering a perspective on the potential and emerging challenges in this strategy for hyperspectral data 
interpretation. The compendium of techniques presented in this work reflects the increasing sophistication 
of a field that is rapidly maturing at the intersection of many different disciplines, including signal and 
image processing, physical modeling, linear algebra and computing developments. 

In this regard, a recent trend in hyperspectral imaging in general (and spectral unmixing in particular) 
has been the computationally efficient implementation of techniques using high performance computing 
(HPC) architectures [217]. This is particularly important to address applications of spectral unmixing with 
high societal impact such as, monitoring of natural disasters (e.g., earthquakes and floods) or tracking 
of man-induced hazards (e.g., oil spills and other types of chemical contamination). Many of these 
applications require timely responses for swift decisions which depend upon (near) real-time performance 
of algorithm analysis [218]. Although the role of different types of HPC architectures depends heavily 
on the considered apphcation, cluster-based parallel computing has been used for efficient information 
extraction from very large data archives using spectral unmixing techimiques [219], while on-board and 
real-time hardware architectures such as field programmable gate arrays (FPGAs) [220] and graphics 
processing units (GPUs) [221] have also been used for efficient implementation and exploitation of 
spectral unmixing techniques. The HPC techniques, together with the recent discovery of theoretically 
correct methods for parallel Gibbs samplers and further coupled with the potential of the fully stochastic 
models represents an opportunity for huge advances in multi-modal unmixing. That is, these developments 
offer the possibihty that complex hyperspectral images that contain that can be piecewise linear and 
nonlinear mixtures of endmembers that are represented by distributions and for which the number of 
endmembers in each piece varies, may be accurately processed in a practical time. 

There is a great deal of work yet to be done; the list of ideas could be several pages long! A few 
directions are mentioned here. Proper representations of endmember distributions need to be identified. 
Researchers have considered some distributions but not all. Furthermore, it may become necessary to 
include distributions or tree structured representations into sparse processing with hbraries. As images 
cover larger and larger areas, piecewise processing will become more important since such images will 
cover several different types of areas. Furthermore, in many of these cases, linear and nonlinear mixing 
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will both occur. Random fields that combine spatial and spectral information, manifold approximations 
by mixtures of low rank Gaussians, and model clustering are all methods that can be investigated for 
this purpose. Finally, software tools and measurements for large scale quantitative analysis are needed to 
perform meaningful statistical analyses of algorithm performance. 
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